Goto

Collaborating Authors

 synthetic voice


MND left her without a voice. Eight seconds of scratchy audio gave it back to her

BBC News

MND left her without a voice. After such a long time, I couldn't really remember my voice, Sarah Ezekiel tells BBC Access All. When I first heard it again, I felt like crying. The onset of motor neurone disease (MND) left Sarah without a voice and the use of her hands at the age of 34. It was within months of her becoming a mum for the second time.


Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices

Székely, Éva, Miniota, Jūra, Míša, null, Hejná, null

arXiv.org Artificial Intelligence

The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.


Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology

Moell, Birger, Aronsson, Fredrik Sand

arXiv.org Artificial Intelligence

This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria. Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology. Our contributions include demonstrating that voice cloning preserves dysarthric speech characteristics, analyzing differences between real and synthetic data, and discussing implications for diagnostics, rehabilitation, and communication. We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices. A licensed speech-language pathologist (SLP) evaluated a subset for dysarthria, speaker gender, and synthetic indicators. The SLP correctly identified dysarthria in all cases and speaker gender in 95% but misclassified 30% of synthetic samples as real, indicating high realism. Our results suggest synthetic speech effectively captures disordered characteristics and that voice cloning has advanced to produce high-quality data resembling real speech, even to trained professionals. This has critical implications for healthcare, where synthetic data can mitigate data scarcity, protect privacy, and enhance AI-driven diagnostics. By enabling the creation of diverse, high-quality speech datasets, voice cloning can improve generalizable models, personalize therapy, and advance assistive technologies for dysarthria. We publicly release our synthetic dataset to foster further research and collaboration, aiming to develop robust models that improve patient outcomes in speech-language pathology.


Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model

Lehečka, Jan, Hanzlíček, Zdeněk, Matoušek, Jindřich, Tihelka, Daniel

arXiv.org Artificial Intelligence

In this paper, we experimented with the SpeechT5 model pre-trained on large-scale datasets. We pre-trained the foundation model from scratch and fine-tuned it on a large-scale robust multi-speaker text-to-speech (TTS) task. We tested the model capabilities in a zero- and few-shot scenario. Based on two listening tests, we evaluated the synthetic audio quality and the similarity of how synthetic voices resemble real voices. Our results showed that the SpeechT5 model can generate a synthetic voice for any speaker using only one minute of the target speaker's data. We successfully demonstrated the high quality and similarity of our synthetic voices on publicly known Czech politicians and celebrities.


AI Can't Make Music

The Atlantic - Technology

The first concert I bought tickets to after the pandemic subsided was a performance of the British singer-songwriter Birdy, held last April in Belgium. I've listened to Birdy more than to any other artist; her voice has pulled me through the hardest and happiest stretches of my life. I know every lyric to nearly every song in her discography, but that night Birdy's voice had the same effect as the first time I'd listened to her, through beat-up headphones connected to an iPod over a decade ago--a physical shudder, as if a hand had reached across time and grazed me, somehow, just beneath the skin. Countless people around the world have their own version of this ineffable connection, with Taylor Swift, perhaps, or the Beatles, Bob Marley, or Metallica. My feelings about Birdy's music were powerful enough to propel me across the Atlantic, just as tens of thousands of people flocked to the Sphere to see Phish earlier this year, or some 400,000 went to Woodstock in 1969.


Scarlett Johansson Says OpenAI Ripped Off Her Voice for ChatGPT

WIRED

Last week OpenAI revealed a new conversational interface for ChatGPT with an expressive synthetic voice strikingly similar to that of the AI assistant played by Scarlett Johansson in the sci-fi movie Her--only to suddenly disable the new voice over the weekend. On Monday, Johansson issued a statement claiming to have forced that reversal, after her lawyers demanded OpenAI clarify how the new voice was created. Johansson's statement, relayed to WIRED by her publicist, claims that OpenAI CEO Sam Altman asked her last September to provide ChatGPT's new voice but that she declined. She describes being astounded to see the company demo a new voice for ChatGPT last week that sounded like her anyway. "When I heard the release demo I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference," the statement reads.


Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations

Pinhanez, Claudio, Fernandez, Raul, Grave, Marcelo, Nogima, Julio, Hoory, Ron

arXiv.org Artificial Intelligence

This poses challenges for applications interested in targeting specific demographics (e.g., an African American business or NGO; a voice-tutoring system for children that are not of White ethnicity, etc.). The ultimate goal of the project described in this paper is to provide to designers, developers, and enterprises the choice of having a professional voice which is clearly recognizable as African American, and therefore more able to address diversity and inclusiveness issues. Being more precise, our goal is to create an African American Text-to-Speech system, which we will refer simply as an African American voice or AA voice, able to produce synthetic audio segments from standard English texts, and which will be recognized by African American speakers and non-speakers as sounding like a native African American speaker. The AA voice should exhibit a level of technical quality similar to the Standard American English (SAE) synthetic voices currently available through professional platforms. The evaluation of the technical quality of the AA voice, however, is not addressed in this paper, which focuses primarily on whether the AA voice can be recognized as sounding like an African American speaker. Linguists [27, 28] have described a continuum of dialects under what is often termed African American Vernacular English (AAVE). At one end of the spectrum, one finds the largest deviation from SAE in terms of lexicon (including slang), syntax and morphology, and phonological/phonetic properties. At the other end, AAVE speakers begin to approach SAE in terms of lexicon and grammar but still retain marked speech characteristics (primarily in terms of intonation, phonation, and vowel placement [14, 28]) which grant the speech a distinctive identity which listeners use as cues in the perception of African American English [44].


Meta unveils Voicebox AI: Should we all be worried?

FOX News

Meta's latest artificial intelligence model called Voicebox is a customized text to speech product that can mimic any specific voice of your choosing.


AI generated Joe Rogan podcast stuns social media with 'terrifying' accuracy: 'Mind blowingly dangerous'

FOX News

The video was made with ChatGPT and is not the actual words of Joe Rogan or Sam Altman. Artificial intelligence chatbot ChatGPT has created a 51-minute episode of The Joe Rogan Experience featuring nearly flawless representations of the podcast host's voice and the voice of OpenAI CEO Sam Altman. The episode begins with an AI-generated Rogan welcoming the audience to the first episode of the "Joe Rogan AI Experience," speaking in a manner and tone that is difficult to distinguish from the real person. "I'm your host, Joe Rogan, or at least that's what this AI model thinks I sound like. Let me tell you, folks, this is some next-level stuff we've got going on here today," faux Rogan continues.


'Of course it's disturbing': will AI change Hollywood forever?

#artificialintelligence

What will AI (artificial intelligence) do to Hollywood? Who better to answer that question than ChatGPT, a thrilling but scary chatbot developed by OpenAI. When the Guardian asked it about AI's potential impact on the film industry, it made the following points: Scriptwriting: AI can be used to analyze existing screenplays and create new ones, potentially leading to more efficient and cost-effective screenwriting. Pre-production: AI can be used to streamline the pre-production process, including casting, location scouting and storyboarding. Special effects: AI can be used to create more realistic and immersive special effects, potentially reducing the need for practical effects and saving time and money in post-production.